63 research outputs found
Robotic manipulation of multiple objects as a POMDP
This paper investigates manipulation of multiple unknown objects in a crowded
environment. Because of incomplete knowledge due to unknown objects and
occlusions in visual observations, object observations are imperfect and action
success is uncertain, making planning challenging. We model the problem as a
partially observable Markov decision process (POMDP), which allows a general
reward based optimization objective and takes uncertainty in temporal evolution
and partial observations into account. In addition to occlusion dependent
observation and action success probabilities, our POMDP model also
automatically adapts object specific action success probabilities. To cope with
the changing system dynamics and performance constraints, we present a new
online POMDP method based on particle filtering that produces compact policies.
The approach is validated both in simulation and in physical experiments in a
scenario of moving dirty dishes into a dishwasher. The results indicate that:
1) a greedy heuristic manipulation approach is not sufficient, multi-object
manipulation requires multi-step POMDP planning, and 2) on-line planning is
beneficial since it allows the adaptation of the system dynamics model based on
actual experience
Prioritized offline Goal-swapping Experience Replay
In goal-conditioned offline reinforcement learning, an agent learns from
previously collected data to go to an arbitrary goal. Since the offline data
only contains a finite number of trajectories, a main challenge is how to
generate more data. Goal-swapping generates additional data by switching
trajectory goals but while doing so produces a large number of invalid
trajectories. To address this issue, we propose prioritized goal-swapping
experience replay (PGSER). PGSER uses a pre-trained Q function to assign higher
priority weights to goal swapped transitions that allow reaching the goal. In
experiments, PGSER significantly improves over baselines in a wide range of
benchmark tasks, including challenging previously unsuccessful dexterous
in-hand manipulation tasks
Hierarchical Imitation Learning with Vector Quantized Models
The ability to plan actions on multiple levels of abstraction enables
intelligent agents to solve complex tasks effectively. However, learning the
models for both low and high-level planning from demonstrations has proven
challenging, especially with higher-dimensional inputs. To address this issue,
we propose to use reinforcement learning to identify subgoals in expert
trajectories by associating the magnitude of the rewards with the
predictability of low-level actions given the state and the chosen subgoal. We
build a vector-quantized generative model for the identified subgoals to
perform subgoal-level planning. In experiments, the algorithm excels at solving
complex, long-horizon decision-making problems outperforming state-of-the-art.
Because of its ability to plan, our algorithm can find better trajectories than
the ones in the training setComment: To appear at ICML 202
Probabilistic approach to physical object disentangling
Physically disentangling entangled objects from each other is a problem
encountered in waste segregation or in any task that requires disassembly of
structures. Often there are no object models, and, especially with cluttered
irregularly shaped objects, the robot can not create a model of the scene due
to occlusion. One of our key insights is that based on previous sensory input
we are only interested in moving an object out of the disentanglement around
obstacles. That is, we only need to know where the robot can successfully move
in order to plan the disentangling. Due to the uncertainty we integrate
information about blocked movements into a probability map. The map defines the
probability of the robot successfully moving to a specific configuration. Using
as cost the failure probability of a sequence of movements we can then plan and
execute disentangling iteratively. Since our approach circumvents only
previously encountered obstacles, new movements will yield information about
unknown obstacles that block movement until the robot has learned to circumvent
all obstacles and disentangling succeeds. In the experiments, we use a special
probabilistic version of the Rapidly exploring Random Tree (RRT) algorithm for
planning and demonstrate successful disentanglement of objects both in 2-D and
3-D simulation, and, on a KUKA LBR 7-DOF robot. Moreover, our approach
outperforms baseline methods
- …